Bioinformatics An Introduction 4th Edition (Jeremy Ramsden)

276

Transcriptomics and Proteomics

The closer the partition coefﬁcient is to unity, the “harder” (i.e., the better separated)

the clustering.

Instead of using a clustering approach, the dimensionality of expression space can

be reduced by principal component analysis (PCA), in which the original dataset is

projected onto a small number of orthogonal axes. The original axes are rotated until

there is maximum variation of the points along one direction. This becomes the ﬁrst

principal component. The second is the axis along which there is maximal residual

variation, and so on (see also Sect. 13.2.2).

Limitations and Alternatives

Microarrays have some limitations, and one should note the following potential

sources of problems: manufacturing reproducibility; variation in how the exper-

iments are carried out [exposure duration (is equilibrium reached?), temperature

gradients, ﬂow conditions, and so on, all of which may severely affect the actual

amounts hybridized]; ambiguity between preprocessed and postprocessed (spliced)

mRNA; mRNA fragment size distribution not matching that of the probes; quan-

titative interpretation of the data; expense. Attempts are being made to introduce

globally uniform standards—minimum information about a microarray experiment

(MIAME)—in order to make a comparison between different experiments possible.

Other techniques have been developed, such as serial analysis of gene expression

(SAGE). In this technique, a short but unique sequence tag is generated from the

mRNA of each gene using PCR (Sect. 17.1.2) and joined together (“concatemer-

ized”). The concatemer is then sequenced. The degree of representation of each tag

in the sequence will be proportional to the degree of gene expression.

The transcription products of many closely related genes such as those originating

from alternative mRNA splicing (Sect. 14.8.5) may be difﬁcult to distinguish using

standard microarray techniques; efforts to overcome that problem include the use of

bundles of tens of thousands of optical ﬁbres, to the ends of which thousands of glass

beads, each loaded with a particular DNA sequence, are ﬁxed. ⁸Since the beads are

comparable in size (a few micrometres in diameter) with the optical ﬁbre cores, each

ﬁbre will carry at most one active bead. Each ﬁbre is individually addressable and the

DNA sequence associated with it is ﬁrst identiﬁed using ﬂuorescent complementary

DNA fragments. The attraction of the technique is the enhanced sensitivity.

Problem. How many nn-mers are needed to unambiguously identify gg genes?

8 Yeatley et al. (2002). These researchers combined their ﬁbre optic array with the technique of RNA-

mediated annealing, selection, and ligation (RASL), in which the mRNAs produced in a particular

cell type are extracted and mixed with DNA oligomers whose sequences are complementary to

those at which two RNA sections could be joined by splicing (“splice junctions”); the presence of

a particular splice junction leads to binding of the DNA oligomers, which can then be multiplied,

ﬂuorescently labelled and exposed to the optical ﬁbre array with which the sequences can be

identiﬁed.